Skip to content

Add subgroup wasm gating example#797

Open
ksgr5566 wants to merge 1 commit intomlc-ai:mainfrom
ksgr5566:webgpu-subgroups
Open

Add subgroup wasm gating example#797
ksgr5566 wants to merge 1 commit intomlc-ai:mainfrom
ksgr5566:webgpu-subgroups

Conversation

@ksgr5566
Copy link
Copy Markdown

Summary

Adds a examples/wasm-gating example showing how to route between baseline and subgroup WebGPU WASM libraries in WebLLM.

  • checks adapter.features.has("subgroups")
  • add subgroup-aware model_lib selection based on WebGPU adapter support
  • switches to -subgroups.wasm when subgroup support is available

Testing

  • verified the example routes from .wasm to -subgroups.wasm when subgroups is reported by the adapter

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new example, wasm-gating, which demonstrates capability-based routing between baseline and subgroup WebGPU WASM builds. The addition includes a TypeScript implementation, HTML structure, and documentation. Feedback was provided to generalize the comments regarding logit_bias token IDs, as the current descriptions are specific to a different model version and could be misleading if the model or tokenizer is updated.

Comment on lines +88 to +104
const reply0 = await engine.chat.completions.create({
messages: [{ role: "user", content: "List three US states." }],
// below configurations are all optional
n: 3,
temperature: 1.5,
max_tokens: 256,
// 46510 and 7188 are "California", and 8421 and 51325 are "Texas" in Llama-3.1-8B-Instruct
// So we would have a higher chance of seeing the latter two, but never the first in the answer
logit_bias: {
"46510": -100,
"7188": -100,
"8421": 5,
"51325": 5,
},
logprobs: true,
top_logprobs: 2,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comments explaining the specific token IDs for "California" and "Texas" are highly model-dependent (Llama-3.1-8B-Instruct). This makes the example less portable and the comments could quickly become outdated or misleading if the model or tokenizer changes. Consider making these comments more generic about the purpose of logit_bias rather than detailing specific token values, or moving such model-specific details to external documentation if necessary.

    // Example of using logit_bias to influence token generation.
    // Specific token IDs and their corresponding words are model-dependent.
    logit_bias: {
      "46510": -100,
      "7188": -100,
      "8421": 5,
      "51325": 5,
    },

const modelRecord = webllm.prebuiltAppConfig.model_list.find(
(entry) => entry.model_id === selectedModel,
);
const appConfig =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to enforce subgroupMinSize <= 32 <= subgroupMaxSize and maxComputeInvocationsPerWorkgroup = 1024 for the subgroup wasm path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants